688 research outputs found

    Simple and Effective Visual Models for Gene Expression Cancer Diagnostics

    Get PDF
    In the paper we show that diagnostic classes in cancer gene expression data sets, which most often include thousands of features (genes), may be effectively separated with simple two-dimensional plots such as scatterplot and radviz graph. The principal innovation proposed in the paper is a method called VizRank, which is able to score and identify the best among possibly millions of candidate projections for visualizations. Compared to recently much applied techniques in the field of cancer genomics that include neural networks, support vector machines and various ensemble-based approaches, VizRank is fast and finds visualization models that can be easily examined and interpreted by domain experts. Our experiments on a number of gene expression data sets show that VizRank was always able to find data visualizations with a small number of (two to seven) genes and excellent class separation. In addition to providing grounds for gene expression cancer diagnosis, VizRank and its visualizations also identify small sets of relevant genes, uncover interesting gene interactions and point to outliers and potential misclassifications in cancer data sets

    VizRank: Data Visualization Guided by Machine Learning

    Get PDF
    Data visualization plays a crucial role in identifying interesting patterns in exploratory data analysis. Its use is, however, made difficult by the large number of possible data projections showing different attribute subsets that must be evaluated by the data analyst. In this paper, we introduce a method called VizRank, which is applied on classified data to automatically select the most useful data projections. VizRank can be used with any visualization method that maps attribute values to points in a two-dimensional visualization space. It assesses possible data projections and ranks them by their ability to visually discriminate between classes. The quality of class separation is estimated by computing the predictive accuracy of k-nearest neighbor classifier on the data set consisting of x and y positions of the projected data points and their class information. The paper introduces the method and presents experimental results which show that VizRank's ranking of projections highly agrees with subjective rankings by data analysts. The practical use of VizRank is also demonstrated by an application in the field of functional genomics

    Interaktivna interakcijska analiza

    Get PDF
    Interakcije lahko razumemo kot korelacije, ki obsegajo več kot le dva atributa. Neka skupina atributov je med seboj v interakciji, če njihovih medsebojnih povezanosti ne moremo popolnoma razumeti, ne da bi jih vse opazovali hkrati. Interakcije so zakonitosti skupin več atributov. V tem članku merimo pomembnost interakcije s postopki, ki temeljijo na Shannonovi entropiji kot pojmu negotovosti, ki je bolj splošen od koncepta statistične variance. Cilj interakcijske analize je analitiku predstaviti interakcije grafično z več tipi diagramov. S tem namenom smo izdelali orodja, ki omogočajo interaktivno preučevanje podatkov in nudijo pomoč pri iskanju zanimivih pogledov na podatke. Interakcije prinašajo tudi nov pogled na nekatere težave postopkov strojnega učenja

    Set-up and installation of a waterproof housing for a CCD based sky scanner

    Full text link
    Če želimo pravilno simulirati dnevno svetlobo že v fazi načrtovanja, potrebujemo pravilne podatke o nebu in porazdelitvi svetlosti neba skozi vse leto. Prav zaradi tega smo razvili merilnik svetlosti neba (sky scanner), ki omogoča trajne meritve porazdelitve svetlosti neba in klasifikacijo tipa CIE neba. Merilnik svetlosti neba je realiziran s pomočjo CCD kamero v vodoodpornem ohišju. Za razliko od ostalih merilnikov svetlosti neba zajame hkrati celotno sliko neba, in sicer v poljubnem intervalu. [1] V tem diplomskem delu smo se osredotočili na izgradnjo vodoodpornega ohišja, ki bo dolgoročno služilo meritvam neba. V ohišje je vstavljen tudi grelec in silica gel za odpravo morebitne vlage.If we wish to correctly simulate daylight already in the planning phase, we need accurate data of the sky and of the distribution of light in the sky throughout the whole year. That is why we have developed a device measuring the sky’s brightness (sky scanner), which enables permanent measurements of the sky’s brightness and its type CIE classification. The sky scanner is realized with the help of a CCD camera in a waterproof casing. Compared with other measuring devices of the brightness of the sky, it captures the whole picture of the sky in one shot and at an adjustable interval. [1] In this undergraduate degree paper we have focused on the construction of the waterproof casing which will serve for long-term measurement of the sky. A heater and silica gel are also installed in the casing to prevent possible humidity related issues

    VizRank: Finding Informative Data Projections in Functional Genomics by Machine Learning

    Get PDF
    VizRank is a tool that finds interesting two-dimensional projections of class-labeled data. When applied to multi-dimensional functional genomics data sets, VizRank can systematically find relevant biological patterns

    A calibrated measure to compare fluctuations of different entities across timescales

    Get PDF
    © 2020 The Authors. Published by Springer. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1038/s41598-020-77660-4A common way to learn about a system’s properties is to analyze temporal fluctuations in associated variables. However, conclusions based on fluctuations from a single entity can be misleading when used without proper reference to other comparable entities or when examined only on one timescale. Here we introduce a method that uses predictions from a fluctuation scaling law as a benchmark for the observed standard deviations. Differences from the benchmark (residuals) are aggregated across multiple timescales using Principal Component Analysis to reduce data dimensionality. The first component score is a calibrated measure of fluctuations—the reactivityRA of a given entity. We apply our method to activity records from the media industry using data from the Event Registry news aggregator—over 32M articles on selected topics published by over 8000 news outlets. Our approach distinguishes between different news outlet reporting styles: high reactivity points to activity fluctuations larger than expected, reflecting a bursty reporting style, whereas low reactivity suggests a relatively stable reporting style. Combining our method with the political bias detector Media Bias/Fact Check we quantify the relative reporting styles for different topics of mainly US media sources grouped by political orientation. The results suggest that news outlets with a liberal bias tended to be the least reactive while conservative news outlets were the most reactive.The work was partially supported as RENOIR Project by the European Union Horizon 2020 research and innovation programme under the Marie Skłodowska–Curie Grant Agreement No. 691152 and by Ministry of Science and Higher Education (Poland), Grant Nos. 34/H2020/2016, 329025/PnH/2016 and by National Science Centre, Poland Grant No. 2015/19/B/ST6/02612. J.A.H. was partially supported by the Russian Scientific Foundation, Agreement #17-71-30029 with co-financing of Bank Saint Petersburg and by POB Research Centre Cybersecurity and Data Science of Warsaw University of Technology within the Excellence Initiative Program—Research University (IDUB).Published onlin

    First international workshop on recent trends in news information retrieval (NewsIR’16)

    Get PDF
    The news industry has gone through seismic shifts in the past decade with digital content and social media completely redefining how people consume news. Readers check for accurate fresh news from multiple sources throughout the day using dedicated apps or social media on their smartphones and tablets. At the same time, news publishers rely more and more on social networks and citizen journalism as a frontline to breaking news. In this new era of fast-flowing instant news delivery and consumption, publishers and aggregators have to overcome a great number of challenges. These include the verification or assessment of a source’s reliability; the integration of news with other sources of information; real-time processing of both news content and social streams in multiple languages, in different formats and in high volumes; deduplication; entity detection and disambiguation; automatic summarization; and news recommendation. Although Information Retrieval (IR) applied to news has been a popular research area for decades, fresh approaches are needed due to the changing type and volume of media content available and the way people consume this content. The goal of this workshop is to stimulate discussion around new and powerful uses of IR applied to news sources and the intersection of multiple IR tasks to solve real user problems. To promote research efforts in this area, we released a new dataset consisting of one million news articles to the research community and introduced a data challenge track as part of the workshop

    Measurement of the tt¯tt¯ production cross section in pp collisions at √s=13 TeV with the ATLAS detector

    Get PDF
    A measurement of four-top-quark production using proton-proton collision data at a centre-of-mass energy of 13 TeV collected by the ATLAS detector at the Large Hadron Collider corresponding to an integrated luminosity of 139 fb−1 is presented. Events are selected if they contain a single lepton (electron or muon) or an opposite-sign lepton pair, in association with multiple jets. The events are categorised according to the number of jets and how likely these are to contain b-hadrons. A multivariate technique is then used to discriminate between signal and background events. The measured four-top-quark production cross section is found to be 26+17−15 fb, with a corresponding observed (expected) significance of 1.9 (1.0) standard deviations over the background-only hypothesis. The result is combined with the previous measurement performed by the ATLAS Collaboration in the multilepton final state. The combined four-top-quark production cross section is measured to be 24+7−6 fb, with a corresponding observed (expected) signal significance of 4.7 (2.6) standard deviations over the background-only predictions. It is consistent within 2.0 standard deviations with the Standard Model expectation of 12.0 ± 2.4 fb
    corecore